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We present a new multigrid solver that is suitable for the Dirac operator in the presence of 
disordered gauge fields. The key behind the success of the algorithm is an adaptive projection onto 
the coarse grids that preserves the near null space. The resulting algorithm has weak dependence 
on the gauge coupling and exhibits very little critical slowing down in the chiral limit. Results are 
presented for the Wilson Dirac operator of the 2d U(l) Schwinger model. 
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The most demanding computational task in lattice 
QCD simulations consists of the calculation of quark 
propagators, which are needed both for generating gauge 
field configurations with the appropriate measure and for 
the evaluation of most observables. The calculation of 
a quark propagator, which in the course of a simula- 
tion must be carried out innumerous times with vary- 
ing sources and gauge backgrounds, consists in turn of 
solving a very large system of linear equations, 

D(U)i/; = x, (1) 

where i\) is the quark propagator, \ is the source term 
and D(U) is the discretized the Dirac operator matrix, 
with elements dependent on the gauge field background 
U. 

In the language of applied mathematics, Eq. [T|is a dis- 
cretized elliptic partial differential equation (PDE). For 
dcfiniteness, 

d 

D x , y = + 
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is the discretized Dirac operator describing a fermion in 
d dimensions with mass m in the Wilson discretization of 
the Dirac equation. In the full 4 dimensional QCD prob- 
lem the matrices 7 M are the 4x4 Dirac spin matrices 
and U is the SU(3) gauge field. The Wilson discretiza- 
tion is not the only one available, but it enters as a cru- 
cial ingredient in the chirality preserving "overlap" and 
"domain wall" discretizations [UEJOij. Moreover, many 
of the problems encountered in solving the Wilson-Dirac 
equation extend to other formulations, such as the "stag- 
gered" fermion discretization. For these reasons, in this 
paper we will concentrate on the Wilson discretization. 

For any realistic QCD simulation the size of the matrix 
in Eq. [I] is too large for using a direct solver. Iterative 



Krylov-space methods, made possible by the sparsity of 
the matrix, must be used for calculating the propagators 
and very efficient solvers have been developed. Yet, as 
the system being considered grows in size (for forefront 
simulation on a 64 4 lattice, D(U) is a 200M x 200M 
complex matrix) and the quark mass in lattice units is 
brought toward zero, the condition number of the matrix 
increases rapidly and so does the computational cost of 
the solution. 

In the field of applied mathematics it has been known 
for some time that in such circumstances the separation 
of physical length scales can be a very effective paradigm 
for improving the effectiveness of numerical algorithms. 
This paradigm has proven correct whether for evolv- 
ing Monte Carlo processes, modeling chemical reactions, 
or molecular dynamics. This is especially true when it 
comes to solving systems of the form Ax = b, where A 
is the sparse matrix that arises from the discretization of 
continuum differential equations, b is a source vector and 
x is the desired solution vector. For such systems the 
multigrid (MG) approach, where discretizations on suc- 
cessively coarser grids are used to accelerate the solution 
finding process, has proven to be the method to beat. 

One exception to the above statement is in solving the 
Dirac operator in lattice QCD: here the nature of the un- 
derlying gauge field in the Dirac operator has proven to 
be especially resistant to various MG approaches. Previ- 
ous attempts at MG solvers have relied on renormaliza- 
tion group arguments to define the coarse grids without 
realizing why the MG approach succeeds, and this has in- 
variably led to failure as the physically interesting regime 
is approached 4 , 5]. In this letter we demonstrate a MG 
algorithm for the Dirac operator normal equations, i.e., 
the positive definite operator given by 

A = D^D, 

that is shown to work in all regimes and vastly reduces 
the notorious critical slowing of the solver as the renor- 
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malized fermion mass is brought to zero. We do so in the 
context of a 2-dimensional system with U(l) gauge field 
(Schwinger model). This system captures many of the 
physical properties (confinement, chiral symmetry break- 
ing, existence of non-trivial topological sectors) of the 
more complex 4-dimcnsional QCD. 

The original formulation of MG is best viewed with the 
example of the free Dirac operator. Multigrid solvers are 
based on the observation that stationary iterative solvers 
(e.g., Jacobi, Gauss-Seidel) are only effective at reduc- 
ing local error components leaving slow to converge, low 
wave-number components in the error. For the free Dirac 
operator these slow modes will be smooth and can be ac- 
curately represented on a coarser grid using simple linear 
averaging. However, on the coarse grid these low wave- 
number error components become modes of shorter range 
and so relaxation should be effective at removing them. 
This process can continue, moving to coarser and coarser 
grids until we have thinned the degrees of freedom enough 
to solve the system exactly. We then promote our solu- 
tion back to the finest grid, where at each level we relax 
on our correction vector to remove any high wave- number 
error components that are introduced. This process is 
known as a MG V-cycle [6J and can be used as a solver 
in its own right, or more effectively as a preconditioner 
for a Krylov method (e.g., conjugate gradient). 

To help facilitate our discussion we introduce the no- 
tation where the degree of coarseness is represented by 
the integer L, where L — 1 represents the finest grid (i.e., 
where our actual problem is defined) and L = N is the 
coarsest grid in an ./V-level MG algorithm. The operator 
used to promote a coarse grid vector on grid L = I + 1 
to the adjacent fine grid L = I is known as the prolonga- 
tor p( l > l+1 \ and it is convenient to take P = P^ as the 
restriction operator used for moving from the fine grid 
to the coarse grid. (This guarantees Hermiticity of the 
coarse grid operator in Eq. |2j. Typically the Galerkin 
definition is used to define the coarse grid operator [5J, 
A {i+i) = p{l,l+V)\ A {l) p{l,l+i) _ (2) 

That this is the best definition for Hermitian positive 
definite A can easily be found by minimizing the error of 
the coarse grid corrected solution vector in the ^4-norm. 

Apart from the coarsest level which is just an exact 
solve, each level of the MG V-cycle can be succinctly 
described as 

1. Relax on the input vector, x^> = R^b^ l \ where 

is a suitable relaxation operator. 

2. Restrict the resultant residual to the next coarsest 
grid, r(' +1 ) = p(M+i)t(&(0 _ aW>). 



The relaxation operator need not be Hermitian for the entire V- 
cycle to be Hermitian: the post-relaxation operator need only be 
the Hermitian cojugate to pre-relaxation. 



3. Apply the L = I + 1 V-cycle on the coarse residual, 
e (l+i) _ y(J+i) r a+i) > 

4. Correct the current solution with coarse grid cor- 
rection, I P) = I (l) + p(W+i) e (l+i). 

5. Relax on the final residual, x^ = RM(b® -Ax®). 

Written explicitly in terms of operators the I th level of 
the V-cycle thus takes the following form 

V® = R (l) +R m + R^A^R^ + (3) 
(1 - ,R(Oyl(0)p(*.'+l)y(Z+l) 

p(M+i)tn _ a^rW) . 

In this form the Hermiticity of the V-cycle is obvious. 
The cost of applying the MG V-cycle becomes apparent 
from this explicit form: on each level we must apply the 
operator A^ a total of 2v + 2 times for each I, where v 
is the number of steps within the relaxation operator. 

The problem in the early application of the above pro- 
cedure to the interacting theory is that, in the presence of 
a non-trivial gauge field, the eigenvectors responsible for 
slow convergence are no longer low wave-number modes 
with smooth variation over the lattice. They are instead 
modes that exhibit localized lumps, typically extending 
over several lattice spacings. In such circumstances, try- 
ing to use smooth components of the fermion field, de- 
fined through a suitable gauge fixing or by some gauge 
covariant procedure, for the definition of the prolongator 
is bound to produce only a limited advantage. This is 
the method that was followed, e.g., in Ref. [HIS], where 
some acceleration was obtained but critical slowing down 
was not fully removed. 

A breakthrough in the application of multiscalc meth- 
ods to more complex problems, such as the one at hand, 
has occurred with the discovery of adaptive MG tech- 
niques [3 [8]. In the adaptive algorithm one lets the MG 
process itself define the appropriate prolongator by an 
iterative procedure which we now concisely describe. 

In the first pass, one uses relaxation alone to solve the 
homogenous problem Ae = with a randomly chosen ini- 
tial error vector. After a certain number, v, of relaxation 
steps, the relaxation procedure, which we symbolically 
represent by 

e^e' = {I-uAYe = {I-u)D^D) v e, (4) 

produces an e' that essentially belongs to the space 
spanned by the slow modes, so e' is now used to define 
a first approximation to the prolongator P. One blocks 
the variables of the original lattice into subsets, which we 
denote by Sj . From e' we construct the vectors e'j , which 
are identical to e' within Sj and outside Sj, and the 
vectors of unit norm Vij — e'j/\e'j\. The extra "1" index 
in v\j has been introduced for a discussion that follows. 
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The prolongator P 1,2 = P^' 2 which maps a vector tpj 
in the coarse lattice, indexed by j, to the original lattice, 
where i denotes collectively the site, spin and possible 
internal symmetry indices, is then defined by 

P if= v ij,h ( 5 ) 
where we have made explicit the fine lattice indices of 

Vij. 

There are variations on how to block the fine lattice, 
i.e., how to define the sets Sj. In the so called "algebraic 
adaptive MG" one partitions the fine lattice into subsets 
on the basis of the magnitude of the matrix elements of A. 
Since such matrix elements in lattice gauge theories are 
typically of uniform magnitude, differing rather in phase 
or, in a broader sense, in orientation within the space 
of gauge transformations, we chose instead to partition 
the lattice geometrically into fixed blocks of neighboring 
lattice sites, specifically 4x4 squares in our study of 
the Schwinger model. Maintaining a regular lattice on 
coarse levels will allow more efficient parallel code with 
exact load balancing. 

Another refinement of the technique consists of apply- 
ing a simple Richardson iteration to the vectors Vij before 
defining the prolongator. The choice of damping parame- 
ter in this smoothing procedure is chosen to minimize the 
condition number of the resulting coarse grid operator. 
The term "smoothed aggregation" is used for this. Thus 
our overall technique can be referred to as "geometric 
adaptive smoothly aggregated MG" . 

We come now to the crux of the adaptive MG method. 
We use the prolongator defined above (Eq. [5]) to imple- 
ment a standard MG V-cycle and apply it, like relaxation 
before, to a randomly chosen error vector. There are two 
possibilities. Either the V-cycle reduces the error with 
no sign of critical slowing down or some large error, e", 
survives the cycle. In the first case, of course, one need 
not proceed: the MG procedure works as is. In the sec- 
ond case, we define another set of vectors vij over the 
coarse lattice by restricting e" to the subsets Sj, mak- 
ing the new vectors orthogonal to the vectors v\j and 
normalizing them to 1. The smoothed aggregation pro- 
cedure is now applied to the set v s j = {v\j,V2j)- A new 
prolongator is defined by projecting over these vectors 

where the index s, now taking values 1,2, can be consid- 
ered as an intrinsic index over the coarse lattice. 

The procedure described in the above paragraph is re- 
peated as necessary, until the application of a V-cycle 
reduces a random initial error to without critical slow- 
ing down. The method works if critical slowing down is 
eliminated with a few iterations of the adaptive proce- 
dure. If this occurs with M vector sets, then the coarse 
lattice will carry M degrees of freedom per site. As with 
all MG methods, the procedure is recursive and it can be 
used to define further coarsenings. 



In testing this algorithm for lattice QCD we generated 
quenched U(l) gauge field configurations on a 128 x 128 
lattice with the standard Wilson gauge field action 

S= J2 0ReOT= E PTteVZU^fP^U? 

and periodic boundary conditions at — 6 and /3 = 10 
at a wide range of mass parameters. These two values of 
(3 define correlation lengths for the gauge field to be l a = 
3.30 and l a = 4.35 respectively, via the area law for the 
Wilson loop: W ~ exp[— A/l*]. For comparison on these 
lattices, a fermion mass gap rh — m — m crtt = 0.01 cor- 
responds to the pseudoscalar meson correlation lengths 
= 6.4 and pT 1 = 12.7 respectively. 2 In the 2- 
dimensional U(l) gauge theory, one can identify a gauge 
invariant topological charge Q, which in the continuum 
limit is proportional to the quantized magnetic flux flow- 
ing through the system. A gauge field with nonzero Q 
corresponds to a Dirac operator with exactly real eigen- 
values and, hence, as the mass gap is brought towards 
zero the condition number becomes infinite. Thus, it is 
important to test both trivial (Q = 0) and non-trivial 
(Q 7^ 0) gauge field topologies. 

We blocked the lattice into 4x4 blocks and imple- 
mented the adaptive smoothly aggregated MG proce- 
dure described above. We used a degree 2 polynomial 
smoother for our relaxation procedure, where the coef- 
ficients were chosen by running two iterations of an un- 
derrelaxcd minimum residual solver (u) = 0.8) and sub- 
sequently held fixed (hence, for our choice of smoother 
R = Rj). The coarsening procedure was repeated twice 
maintaining M = 8 vectors in all coarsenings, down to 
an 8 x 8 lattice, over which the equations were solved ex- 
actly. For each gauge field we performed the set up pro- 
cedure for the MG preconditioner for the lightest mass 
parameter only, and reused these null space vectors for 
the heavier masses. We used this constructed V-cycle as 
a preconditioner for the conjugate gradient (CG) tech- 
nique where the operator defined in Eq. [4] is applied at 
each iteration to the CG direction vector. 

If one compares the number of CG iterations needed 
to achieve convergence with or without MG precondi- 
tioning, the gain obtained with the MG method is dra- 
matic: for example, with f3 = 6, m = 0.01 and Q = 0, 
it takes 3808 iterations to achieve convergence, in the 
sense above, with a straightforward application of the 
CG technique, whereas it takes only 26 iterations using 
the MG preconditioner. However this comparison does 
not take into account the fact that many more operations 
per iterations must be performed when applying the MG 
preconditioner. To achieve a more balanced comparison, 
in Figs. [T] [2] we plot the total number of applications of D 



2 All quantities are expressed in lattice units. 
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FIG. 1: Number of Dirac operator applications of standard 
CG vs. MG-preconditioned CG solver as function of the 
fermion mass gap at f3 = 6 with topological number Q = 
and Q = 4 (point source, relative solver residual \r\ = 10 -14 ). 
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FIG. 2: Number of Dirac operator applications of standard 
CG vs. MG-preconditioned CG solver as function of the 
fermion mass gap at — 10 with topological number Q = 
and Q = 4 (point source, relative solver residual |r| = 10 -14 ). 



and D' done on the fine lattice. This reflects better the 
actual cost of the calculations (at each iteration of MG- 
CG there are 6 applications of D^D: 1 application in the 
outer CG, and 2 pre- and 2 post- coarsening smoothing 
applications and 1 further application required to form 
the residual) . We do not include the additional cost aris- 
ing from the coarse lattices since this is expected to be 
a small overhead, and has not been optimized for our 
model calculation. The advantage coming from the use 
of the adaptive MG technique is still very dramatic. In 
particular, we see that critical slowing down, if not totally 
eliminated, is very substantially reduced. These results 
are for point sources, however, we tried a variety of differ- 
ent source vectors for this analysis (e.g., Gaussian noise, 
Z4 noise) and found very little dependence of MG-CG 
performance on the source vector. 

From the point of view of computational complexity, 
one should also take into account the cost of setting up 
the MG preconditioner, i.e., of constructing the prolon- 



gator P. This cost is however heavily amortized, to the 
point of being negligible, if, as is often the case, one must 
apply the solver to systems with multiple given vectors 
(for example, solving for all color and spin components 
of a quark propagator or, in the calculation of discon- 
nected diagrams where, 0(1000) inverses are required to 
estimate the trace of the inverse Dirac operator). 

Our results, albeit for now limited to a 2-dimensional 
example, provide a clear indication that adaptive MG 
can be made to work with the lattice Dirac operator. 
What appears to be at the root of its success is that, 
although the modes responsible for slow convergence of 
the Dirac solver on a fine lattice are not low wavenumber 
excitations, like in the free case, their span can be well 
approximated by a set of vectors of limited dimensional- 
ity on the blocks that define the coarse lattice. Earlier 
attempts |U [5] tried to find the approximating subspaces 
on the basis of smoothness, failing to eliminate critical 
slowing down when the pseudoscalar length exceeded the 
disorder length of the gauge field: fi^ 1 > l a . Adaptive 
MG finds the coarse subspaces through the iterative ap- 
plication of the method itself. It is of course crucial that 
the approximation to the space of slow modes can be 
achieved with a small number of vectors on the indi- 
vidual blocks, otherwise the application of the method 
would not be cost effective. But this appears to be the 
case in the examples we studied and, if the results hold 
true in general, adaptive MG has the potential of sub- 
stantially speeding up lattice QCD simulations as the 
increase of available computational power leads one to 
consider ever larger lattices. The observation that the 
space of slow modes may be of limited span is also at 
the root of a method recently proposed by Liischer in 
Ref. (5], although the technique there is quite different 
from the one we follow. The application of the method 
to 4-dimcnsional systems is in progress. 
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